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Abstract 

Gaussian errors are sometimes inappropriate in a multivariate linear regression setting be¬ 
cause, for example, the data contain outliers. In such situations, it is often assumed that the 
error density is a scale mixture of multivariate normal densities that takes the form /(e) = 

/ 0 °° |£| - 5u^ \fue) h(u) du, where d is the dimension of the response, is the 

standard c/-variate normal density, £ is an unknown dxd positive definite scale matrix, and h(-) 
is some fixed mixing density. Combining this alternative regression model with a default prior 
on the unknown parameters results in a highly intractable posterior density. Fortunately, there 
is a simple data augmentation (DA) algorithm and a corresponding Haar PX-DA algorithm that 
can be used to explore this posterior. This paper provides conditions (on h) for geometric ergod- 
icity of the Markov chains underlying these Markov chain Monte Carlo (MCMC) algorithms. 

These results are extremely important from a practical standpoint because geometric ergodic- 
ity guarantees the existence of the central limit theorems that form the basis of all the standard 
methods of calculating valid asymptotic standard errors for MCMC-based estimators. The main 
result is that, if h converges to 0 at the origin at an appropriate rate, and J 0 °° ui h(u) du < oo, 
then the DA and Haar PX-DA Markov chains are both geometrically ergodic. This result is 
quite far-reaching. For example, it implies the geometric ergodicity of the DA and Haar PX- 
DA Markov chains whenever h is generalized inverse Gaussian, log-normal, inverted gamma 
(with shape parameter larger than d/2), or Frechet (with shape parameter larger than d/2). The 
result also applies to certain subsets of the gamma, F, and Weibull families. 
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1 Introduction 


Let Yi, Y 2 , ■ ■ ■, Y n be independent d-dimensional random vectors from the multivariate linear re¬ 
gression model 


Yi = /3 T Xi + £2 £i , 


( 1 ) 


where x* is a p x 1 vector of known covariates associated with Y t , /3 is a p x d matrix of unknown 
regression coefficients, £ is an unknown positive definite scale matrix, and e 1 , are iid errors. 

In situations where Gaussian errors are inappropriate, e.g., when the data contain outliers, scale 
mixtures of multivariate normal densities constitute a rich class of alternative error densities (see, 


Andrews and Mallows, 


e.g., 


take the form 


1974; 


Fernandez and Steel, 


1999 


2000; 


West, 


1984). These mixtures 


fh(e) = / — X —d ex P { - du , 

Jo (2 tt )2 1 2 J 

where h is the density function of some positive random variable. We shall refer to h as a mixing 
density. By varying the mixing density, one can construct error densities with many different types 
of tail behavior. A well-known example is that when h is the density of a Gamma(^, random 
variable, then fh becomes the multivariate Student’s t density with v degrees of freedom, which, 

_ d-\-u 

aside from a normalizing constant, is given by [l + v~ 1 £ T e\ 2 . 

Let Y denote the nx d matrix whose 7th row is Y£, and let X stand for the n x p matrix whose 
ith row is xj, and, finally, let e represent the nx d matrix whose 7 th row is ej. Using this notation, 
we can state the n equations in (|T|) more succinctly as follows 


Y = X/3 + e£5 . 


( 2 ) 


Let y and y, denote the observed values of Y and Y t , respectively. 

Consider a Bayesian analysis of the data from the regression model d2l) using an improper prior 

. , _ _ _ _ d(d+ 1) 

on (/3, £) that takes the form ui(j3. £) oc |£| -a /s d (£) where S,i C M 2 denotes the space of 
d x d positive definite matrices. Taking a = (d+ l)/2 yields the independence Jeffreys prior, which 
is a standard default prior for multivariate location scale problems. The joint density of the data 
from model (O is, of course, given by 

n 

/(y|/3,£) = II 

i= 1 

Define 

m{y) = f f /(y|/3,£)w(/3,£)(i/3d£ . 

JS d 7RPX d 

The posterior distribution is proper precisely when m(y) < 00. Let A denote the nx (p + d ) matrix 
(.X : y). As we shall see, the following conditions are necessary for propriety: 

(N 1) rank(A) = p + d ; 


F 

■Jo 


d 

U 2 


/ X d 

(2tt)2 


. 1 
£2 


exp 


~7;(yi-l3 T xF) £ 1 (y i - xFj ih(u) du 


(3) 
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(N 2) n > p + 2d — 2a . 


We assume throughout the paper that (-/VI) and (7V2) hold. Under these two conditions, the Markov 
chain of interest is well-defined, and we can engage in a convergence rate analysis whether the 
posterior is proper or not. This is a subtle point upon which we will expand in Section [3] 

Of course, when the posterior is proper, it is given by 


7 t *(/ 3 ,£| y) 


/(ylfrSM&S) 

m(y) 


This density is (nearly always) intractable in the sense that posterior expectations cannot be com¬ 
puted in closed form. However, there is a well-known data augmentation algorithm (or two-variable 


Gibbs sampler) that can be used to explore this intractable posterior density (see, e.g., Liu, 1996). In 


order to state this algorithm, we must introduce some additional notation. For z = (z\,..., z n ), let 
Q be an n x n diagonal matrix whose v'th diagonal element is zj 1 . Also, define O = (. X T Q~ l X )~ 1 
and ft = (X r Q- { Xy 1 X T Q~ 1 y. We shall assume throughout the paper that 


h(u) du < oo , 

where h is the mixing density, and we will refer to this condition as “condition X4.” Finally, define 
a parametric family of univariate density functions indexed by s > 0 as follows 



ip(u-,s) 


b(s) «2 e 


^ Hu) , 


where b(s) is the normalizing constant. The data augmentation (DA) algorithm calls for draws from 
the inverse Wishart (IW,/) and matrix normal (N 7 ,.j) distributions. The precise forms of the densities 
are given in the Appendix. We now present the DA algorithm. If the current state of the DA Markov 
chain is (/3 m ,E m ) = (/?,£), then we simulate the new state, (/3 m+ i, £ m+ i), using the following 
three-step procedure. 


Iteration m + 1 of the DA algorithm: 

1. Draw {Z t }" =1 independently with Z, r\_/ ;(/3 T x i -y i ) T E~ 1 (/5 T Xi — yi)^j, and call the 

result z = (zi,..., z n ). 

2. Draw 

^m+l ~ IW d ^n — p + 2a - d - 1, (y T Q 1 y - V) 

3. Draw /3 m+1 ~ N Pid (n, D, £ m+ i) 

Obviously, in order to run this algorithm, one must be able to make draws from U(-: ,s). When h 
is a standard density, yj often turns out to be one as well. For example, when h is a gamma density, 
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ip is also gamma, and when h is inverted gamma, ip is generalized inverse Gaussian (see Section[5]). 
Even when ip is not a standard density, it is still a simple entity - a univariate density on (0, oo) - 
and so is usually amenable to straightforward sampling. In particular, if it is possible to make draws 
from h, then h can be used as the candidate in a simple rejection sampler for ip. 

Denote the DA Markov chain by 4> = {(/3 m . S m )}^ =0 . The main contribution of this paper 
is to demonstrate that $ is geometrically ergodic as long as h converges to zero at the origin at an 
appropriate rate. (A formal definition of geometric ergodicity is given in Section [3]) Our result is 
remarkable both for its simplicity and for its scope. Indeed, the conditions turn out to be extremely 
simple to check, and, at the same time, the result applies to a huge class of Monte Carlo Markov 
chains. It is well known among Markov chain Monte Carlo (MCMC) experts that establishing 
geometric ergodicity of practically relevant chains is extremely challenging. Thus, it is noteworthy 
that we are able to handle so many such chains simultaneously. Of course, the important practical 
and theoretical benefits of basing one’ s MCMC algorithm on a geom e trically ergodic M a rkov c hain 
have been well-do cumented by, e.g., iRoberts and Rosenthall (119981) . Ijones and Hobertl (1200 D and 


Flegal et al. (2008). In order to give a precise statement of our main result, we now define three 


classes of mixing densities based on behavior near the origin. 

Define M + = (0, oo), and let h : M + —» [0, oo) be a mixing density. If there is a 5 > 0 such that 
h(u) = 0 for all u E (0, <5), then we say that h is zero near the origin. Now assume that h is strictly 
positive in a neighborhood of 0 (i.e., h is not zero near the origin). If there exists a c > — 1 such that 


lim 


h(u) 

u c 


E M+ , 


then we say that h is polynomial near the origin with power c. Finally, if for every c > 0, there 
exists an i) c > 0 such that the ratio is strictly increasing in (0, //,. ), then we say that h is faster 
than polynomial near the origin. 

Every mixing density that is a member of a standard parametric family is either polynomial near 
the origin, or faster than polynomial near the origin. Indeed, the gamma, beta, F, Weibull, and 
shifted Pareto densities are all polynomial near the origin, whereas the inverted gamma, log-normal, 
generalized inverse Gaussian, and Frechet densities are all faster than polynomial near the origin. 
We establish these facts in Section [5j Here is our main result. 


Theorem 1 . Let h be a mixing density that satisfies condition M. Assume that h is zero near the 
origin, or faster than polynomial near the origin, or polynomial near the origin with power c > 
n-p+2a-d,-i ' p/ len p le p OS f er i or distribution is proper and the DA Markov chain is geometrically 
ergodic. 


This result is more substantial than typical convergence rate results for DA algorithms and Gibbs 
samplers in the sense that it applies to a huge class of mixing densities, whereas typical results 
apply to relatively small parametric families of Markov chains (see, e.g., Pal and Khare, 2014). 
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Note that, outside of the polynomial case, the only regularity condition in Theorem Q] is the rather 
weak requirement that / 0 °° wT h{u) du < oo. Thus, for example, Theorem Q] implies that if h 
is generalized inverse Gaussian, log-normal, inverted gamma (with shape parameter larger than 
d/2), or Frechet (with shape parameter larger than d/2), then the DA Markov chain converges at a 
geometric rate. 

Another notable consequence of Theorem 1 is the following. Suppose that h satisfies the con¬ 
ditions of Theorem [T] and let B > 0. Note that we can alter h on the set [ B , oo) in any way we 
like, and, as long as condition A4 continues to hold, the corresponding Markov chain will still be 
geometrically ergodic. 

When h is polynomial near the origin, there is an extra regularity condition for geometric er- 
godicity that can be somewhat restrictive. For example, take the case where h is the gamma density 
with shape and rate both equal tou/2 (so the error density is Student’s t with u degrees of freedom). 
In this case, Theorem [Qimplies that the DA Markov chain will converge at a geometric rate as long 
as ^ > n — p + 2a — d+1. If n — p + 2a — d + 1 is small, then this condition is not too troublesome. 
However, if this number happens to be large, then Theorem [Tj applies only when the degrees of 
freedom of the t distribution are large, which is not very useful. It is an open question whether the 


condition c > 


n—p+2a—d—l 


-is necessary. 

A couple of special cases of Theorem [H have appeared previously in the literature. In particular, 
the result for the gamma mixing density described above was established bv lRov and Hobert ( 2010 ) 


in the special case of the independence Jeffreys prior where a = (d + 1 )/2. Also, Jung and Hobert 
(2014) showed that, when d = 1 and the mixing density is inverted gamma with shape parameter 
larger than 1/2, the Markov operator associated with the DA Markov chain is a trace-class operator, 
which implies that the corresponding chain converges at a geometric rate. 

It is often possible to convert a DA algorithm into a Haar PX-DA algorithm that is theoretically 
super ior to the underlying DA algorithm, yet e s sentia lly equival ent in terms of simulat ion effort (see, 


Hobert and Marchevl. 2008; 


Liu and Wu. 


1999). In fact, IRov and Hobert] (2010) developed a 


— <ktl p 

2 • 11 


e-g-, 

Haar PX-DA valiant of the DA algorithm described above for the special case in which a = 
turns out that, when a / ^±2., an additional regularity condition on h is required in order to define 
this alternative algorithm. In particular, the Haar PX-DA algorithm can be defined only when 


(d+l-2a)d . 

t H+ - - 


h{tzi) 


2=1 


dt < oo 


(4) 


for (almost) all z € M™. An argument similar to one in used Roy and Hobert (2010, Section 3) 
shows that © holds if 

r°° (d+l-2a)d 

/ u 2 h{u) du < oo . (5) 

Jo 

Note that © always holds when a = ^±1. Now assume that © holds, and define a parametric 
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family of density functions, indexed by 2 € M+, that take the form 


e( 


. (rf+l —2 a)d 

v; z) oc v ^ 2 


j[J h(vzi 


i —1 


!«.+ (v) 


As with the parametric family if>(-; s), when h is a standard density, £ often turns out to be standard 
as well. For example, if h is gamma, inverted gamma, or generalized inverse Gaussian, then £ turns 
out to be a member of the same parametric family. If the current state of the Haar PX-DA Markov 
chain is (/3^, X*„) = (/?, X), then we simulate the new state, (3* n , X* 1+ ( ), using the following 

four-step procedure. 


Iteration m + 1 of the Haar PX-DA algorithm: 

1. Draw { Z [}” = j independently with Z[ ~ ; (/3 T Xj — yj) T S _1 (/3 T Xj — yi)^J, 

result z' = (^,...,4). 

2. Draw 1/ ~ £(■; 2 :'), call the result v, and set z = (vz\. ..., vz' n ) T . 

3. Draw 

£,* n+1 ~ lW d (n-p + 2a- d- 1 , (y T Q~ 1 y - y T n~ 1 y 


and call the 


4. Draw /3^ +1 ~ N P!d (p, U, S^ +1 ) 


Note that the only difference between this algorithm and the DA algorithm is one extra univariate 
draw (from £(•; •)) per iteration. Hence, the two algorithms are virtually equivalent from a computa¬ 
tional standpoint. Theoretically, the Haar PX-DA algorithm is at least as good as the DA algorithm. 


2008; 

Khare and Hobert, 

2011 

; Liu and Wu, 

1999) 


evidence that the Haar PX -DA algorithm can be far superior (see, e.g. iMeng and van Dvkl. 


1999; 


van Dyk and Meng, 20011). The following corollary to Theorem [I] is an immediate consequence of 


the fact that, in general, the norm of the Markov operator of a Haar PX-DA chain is no larger than 
that of the underlying DA chain. 


Corollary 1. Let h be a mixing density that satisfies condition Ai, and assume that (j4| holds. 
Assume that h is zero near the origin, or faster than polynomial near the origin, or polynomial 
near the origin with power c > n ~ p+ ^ a ~^~ 1 . Then the Haar PX-DA Markov chain is geometrically 
ergodic. 

The remainder of this paper is organized as follows. Section[2]contains a brief description of the 
latent data model that leads to the DA algorithm, as well as a formal definition of the DA Markov 
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chain. Section[3]contains a drift and minorization analysis of ( I> that culminates in a simple sufficient 
condition for geometric ergodicity that depends only on h. This result is used to prove Theorem Q] 
in Section [4] In Section[5] we consider the implications of Theorem Q] when h is a member of one of 
the standard parametric families, and we also develop conditions under which a mixture of mixing 
densities leads to a geometric DA Markov chain. Finally, the Appendix contains the definitions of 
the inverse Wishart (IW^) and matrix normal (N p densities. 


2 The latent data model and the DA Markov chain 


In order to formally define the Markov chain that the DA algorithm simulates, we must introduce 
the latent data model. Suppose that, conditional on (/3, E), {(Yj, Z ,)}'■_, are iid pairs such that 

Yi\Zi = Zi ~ N d{/3 T Xi, Y/zi) 

Zi ~ h . 

Denote the joint density of {(Tj, Zi)} p =1 by f(y, z\P , E). It’s easy to see that 

[ f(y,z\P,'£)dz = f(y\P,Y.) , 

where the right-hand side is the joint density of the data defined at ©. Now define a (possibly 
improper) density on W pxd x Sd x R” as follows 

AP, E, z\y) = f(y , z\f3, S) cj(/3, E) , 


and note that 



Av) dz = f(y\/3, E) u{/3, E) . 


( 6 ) 


It follows that 7 t(/3, S, z\y) is a proper density if and only if the posterior distribution is proper. 
Importantly, whether 7r(/3, E, z|y) is proper or not, conditions (A r l) and (A r 2) guarantee that the 
corresponding “conditional” densities, 7r(/T Y.\z,y) and 7r(c|/3, E, y), are well-defined. Indeed, 
7r(/3,E|^,y) = 7r(/?|S, z, y)7r(E|^, y), and routine calculations show that 7r(/3|E,z, y) is a matrix 
normal density, and 7 t(E|z, y) is an inverse Wishart density. (The precise forms of these densities 
can be gleaned from the algorithm stated in the Introduction.) It is also straightforward to show that 


n 

n(z\P,Y,y) = Y\_A{zi\n) , 

Z— 1 


where n = (/ 3 T Xi - yi ) T E 1 (/3 T Xi - yi) for i = 1 , 2, ..., n. 

The DA algorithm simulates the Markov chain $ = {(/3 m ,E m )}“ =0 , whose state space is 
X := W pxd x Sd, and whose Markov transition density (Mtd) 

k{P, E|/3,E) = [ Tr(P,Y\z,y)TT(z\P,t,y)dz . 
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We suppress dependence on the data, y, since it is fixed throughout. Note that "(if Xjz.y) and 
tt(z\/3, £, y) are both strictly positive on Z = {z € M+ : h(z) > 0}, and Z has positive Lebesgue 
measure. Therefore, fc(/3, S|/3, S) is strictly positive on X x X, which implies irreducibility and 
aperiodicity. It’s easy to see that © is an invariant density for <f>. Consequently, if the posterior is 
proper, then the chain’s invariant density is the target posterior, it* (/3, S| y), and the chain is positive 
recurrent. In fact, it is positive Harris recurrent (because k is strictly positive). 

We end this section by describing an interesting simplification that occurs in the special case 
where a = (d + l)/2 and n = p + d. Roy and Hobert (feOlQ) show that when a = (d + l)/2, we 
have 

n;=i Hzi) 


■n(z\y) = 



>S d 


7 r(/3, £, z\y ) d/3 dX oc- 


2 |A T Q _1 A| 2 


which is not necessarily integrable in z, because the posterior is not necessarily proper (see, e.g., 
[Fernandez and Steel . 1999 ). However, when n = p + d, A is square and non-singular (because of 


(iVl)), and we have the stunningly simple formula 


n 

7r(~|y) oc J\h(zi) . 
i =1 

Consequently, when a = (d + l)/2 and n = p + d, the posterior distribution is proper, and if we are 
able to draw from the mixing density, h, then we can make an exact draw from the posterior density 
by drawing sequentially from 7r(^|y), 7r(X| z, y), and 7r(/3|X, z, y), and then ignoring z. 

In the next section, we develop a condition on h that implies geometric ergodicity of the DA 
Markov chain, <b. 


3 A Drift and Minorization Analysis of $ 


Here we analyze the DA Markov chain via d rift a nd minorization argum e nts. F or background on 
these techniques, see lJones and Hoberd (12001 1) and lRoberts and Rosenthall (120041) . Suppose that the 
posterior distribution is proper. Then the DA Markov chain <I> is geometrically ergodic if there exist 
M : X —> [0, oo) and p € [0,1) such that, for all m € N, 



'S d 


fe m (/3,X|/3,X) -7r*(/3,S|y) d/3 dS < p 


(V) 


where k m is the m-step Mtd. The quantity on the left-hand side of (|7]l is, of course, the total 
variation distance between the posterior distribution and the distribution of (/3 m ,X m ) conditional 
on (/3o, So) = 0, S). Here is the main result of this section. 

Proposition 1. Let h be a mixing density that satisfies condition M. Suppose that there exist 


X € [0, n _ p +o a _i ) an d C € M such that 


COO d ~ 2 7 / \ 7 

J 0 u 2 e 2 h(u) du 
f™Je-^h(u)du 


< A s + L 


( 8 ) 























for every s > 0. Then the posterior distribution is proper, and the DA Markov chain is geometrically 
ergodic. 


Proof. We will prove the result by establishing a drift condition and an associated minorization 
condition, as in Rosenthal!’s (1995) Theorem 12. We begin by noting that the drift and minorization 
technique is applicable whether the posterior distribution is proper or not. (In more technical terms, 
it is not necessary to demonstrate that the Markov chain under study is positive recurrent before 
applying the technique.) Moreover, the DA Markov chain cannot be geometrically ergodic if the 
posterior is improper (since the corresponding chain is not positive recurrent). Hence, conditions 
that imply geometric ergodicity of the DA Markov chain simultaneously imply propriety of the 
corresponding posterior distribution. 

Our drift function, V : W pxd x S,/ —> M + , is as follows 


V0, E) = (Vi - P T Xi) T T, 1 {yi - p T Xi) . 


2=1 


Part I: Minorization. Fix l > 0 and define 


H / = {(/3,£) :!/(/?,£)<(} . 

We will construct e E (0,1) and a density function f* : M pxrf xSd —>• [0, oo) (both of which depend 
on l ) such that, for all 0, E) E B[, 

k0,V0,t)>ef*0,Z). 

This is the minorization condition. We note that it suffices to construct e E (0,1) and a density 
function / : R” —> [0, oo) such that, for all 0, E) E Bi, 

7r(z|/3,£,y) > e/O) . 

Indeed, if such an / exists, then for all 0, E) E Bi, we have 

k0, S|/3,E)= [ vr(/3, E|z, y) Tt(z0, S, y) dz > e f tt0, H\z,y) f (z) dz = ef*0, E) . 

Jr" 

We now build /. Define f* = (y* — f3 T xf) T E _1 {jji — j3 T Xi), and note that 

n n 

n(z\P, S, y) = n = n z i h ( z i) ■ 

2=1 2=1 

Now, for any s > 0, we have 

1 1 

/ 0 °° U 2 e~h(u ) du / 0 °° iB h(u ) du 
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By definition, if (/3, S) € Bi, then Y^i=\ ^ which implies that f t < l for each i = 1 ,n. 

Thus, if (/3, X) E Bi, then for each i = 1,..., n, we have 


d r aza d [ z . 

z? e 2 - h(zi) > z? e~~ h(zi) . 


Therefore, 


n(z\/3,£,y) > 


/*00 ^ “I —71 n 

/ M 2 fi(tt) du ] [ 
.-'O J -_i 


^ Iza 

z? e a" fi(zj) 


/ 0 °° U 2 e 2 /i(«) du 
/ 0 °° M2 dtt 

:= e/(*) ■ 


Z— 1 

n n 


d l Za 

z? e 2 


h(zi ) 


/ 0 °° it 2 e 2 fi('u) du 


Hence, our minorization condition is established. 

Part II: Drift. To establish the required drift condition, we need to bound the expectation of 
V(fl m+1 ,E m+1 ) given that (fl m , S m ) = 0, E). This expectation is given by 



>S d 


iXd 


F(/?,E) £;(/?, E|/3,E) d/3 d£ 


[ \ [ \ [ V(P,Z)ir(P\X,z,y)dp 
JR" Js d L J Rp xd 


tt(S| z, y) dX }n(z\/3, X, y) dz . 


Calculations in 


Roy and Hobertfs ( 20101) Section 4 show that 


JS d L- 

It follows from © that 


r "I n i 

/ ^(/3,S)7r(/3|E,z,y)d/3 7r(X|z, y) dX < (n - p + 2a - 1) V] — . 

JRP xd J i=1 z i 


^(/3,X)-7r(/3|X,z,y) d/3 


tt(S| z,y) dX }n(z\P,t,y)dz 


< (n — p + 2a — 


1 )/ f ^ 1 

M Zi 


n{z\/3,d,y) dz 


" d-2 _fiU 

= (n — p-h 2a — 1) } b(fi) / u 2 e 2 h[u)du 
i =1 • 7 ° 

< (n — p + 2a — 1) ( A fj + nL 

x i=l 

= A(ra — p + 2a — 1)V 0, X) + (n — p + 2a — l)nL 
= AV(/3,fi) + L', 

where A' := A (n — p + 2a — 1) € [0,1) and L' := in — P + 2a — l)nL. Since the minorization 
condition holds for any l > 0, an appeal to Rosenthafs (1995) Theorem 12 yields the result. This 
completes the proof. □ 
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Remark 1. A straightforward argument shows that, if the mixing density h(u ) satisfies the con¬ 
ditions of Proposition [7] then so does every member of the corresponding scale family given by 
lh(f)Jora> 0. 

In the next section, we parlay Proposition [I] into a proof of Theorem Q] The key is to show that 
h satisfies ([8]l as long as it converges to zero at the origin at an appropriate rate. 


4 Proof of Theorem U 

In this section, we prove three corollaries, which, taken together, constitute Theorem [0 There is 
one corollary for each of the three classes of mixing densities defined in the Introduction. 

4.1 Case I: Zero near the origin 

Corollary 2. Let h be a mixing density that satisfies condition A4. If his zero near the origin, then 
the posterior distribution is proper and the DA Markov chain is geometrically ergodic. 

Proof Fix s > 0, and recall that hiu) = 0 for u G (0, 6) for some <5 > 0. Hence, 

f 0 °° e~^ h{u) du _ fs°° e_ ^ h (u) du < ± f s °° e -f h(u) du _ 1 

/ 0 °° u? e~~ h(u) du f/° y/uu~^~ e T h{u)du s/6 f s °° u~ 2 ~ e T h(u) du d 

Thus, the conditions of Proposition [I] are satisfied and the proof is complete. □ 


4.2 Case II: Polynomial near the origin 


Fix A G [0, oo) and let AiX) denote the set of mixing densities, h, for which there exists a constant, 
k\, such that 


io°° 7S e ” h ( u ) du 

J 0 °° \/ue 2 " h{u) du 

for every s > 0. For each mixing density, h, we define 


< As + k\ 


A h = inf {A € [0, oo) : h G Al(A)} . 

If h is not in A{\) for any A G [0, oo), then we set A/, = oo. Here is an example. Suppose that h is 
a Gamma(a, 1) density. If a > 1/2, then routine calculations show that 

/ n °° 4= e~ t h(u) du 1 2 

Jo ^ ___=_ i _ s + __ (9) 

/ 0 °° sf^e-f h{u) du 2a - 1 2a - 1 ' 

So, in this case, A h = 9 1 _ 1 . On the other hand, if a G (0,1/2], then A^ = 00 . 

Our next result shows that A h is determined solely by the behavior of the density h near 0. 
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Lemma 1. Suppose that h and h are two mixing densities that are both strictly positive in a neigh¬ 
borhood of zero. If 


, h(u) 

lim n TT \ G (°> °°) ’ 

u ->° h(u) 


then, \ h = Ar. 


Proof Assume that \~ h < oo. We will show that A/, < X~ h . Fix A € (A^, oo) arbitrarily. Let 
A* = (A^ + A)/2. Since lim )t ^ 0 € (0, oo), there exists 77 > 0 such that 


< c 2 „ 


( 10 ) 


for every u € ( 0 , rj], where C\^, € 


h(u) 

satisfy = \/ jr > 1. Also, note that for such an p, 


f } °°y/ue 2 h(u)du e 2 ff° y/uh(u)du e 4 f°° y/uh(u)du 


< 


Jr+v ^* 3 2 h{u)du J^ 2 y/ue s ?h(u)du f^ 2 y/uh(u) du 


< 


V /2 


Consequently, 


y/ue s 5 > h(u)du 
Sr. y/ue~^ h(u) du 


0 as s —» 00 , 


so there exists > 0 such that 


Jq sfue 2 h(u)du 


= 1 - 


f°° y/ue S 2h{u)du 


4 +v^e 2 h{u)du I R+ Vue 2 h(u) du 

for every s > s v . It follows from (fTOl ) and (fTTI) that for every s > s v , 


> 


Ir + 7 s e 2 HH du f^e 2 h(u) du l 


0 07 


+ 


r 00 I 

/ - 7 = e 2 

tr? 01 


/i(tx) du 


J R+ y/ue 2 h(u) du J R+ y/ue 2 h(u) du f R+ y/ue 2 h(u) du 

Jo f/z e ~~ Hu) du 1 yffieT^ h(u) du 


< 

— rv 


+ 


< 


< 


< 


Jq y/ue 2 h(u)du hf R+ y/ue 2 h(u) du 

r< rv J_ ~ 

0'2,77 JO 07 


e 2 h(tt) du 1 

+ 


+ 


Ci ,?7 Jq y/ue 2 h(u) du V 

/T /T 

V A* V A* J R+ yTie T /i(w) du V 

A / R+ 7 ^ ^ 1 

A * Jr 1 M«) du + V 


Since h € A(A*), there exists k such that 


Jr+ 07 e “ k(«) d « A * 1 . A , 1 

7- r~ 1 / \ . — 7 t( A s + fc) + - — As + — k + - 

f R y/ue 2 h(u)du A p X p 


( 11 ) 


( 12 ) 
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for every s > s v . Our assumptions imply that J R+ h(u) du < oo. Together with (ITOl) . this leads 
to J R+ h(u ) du < oo. Then, since 

Jr+ ^ e ~^ h ( u ) du Ir+ %) du 4+ :3s 

sup — - 1 -< sup —— j —j-< - -j --- , 

se(o,s v ~) J R+ ^/ue 2 h(u)du se(o,sr,)e ~2 f Q y/uh(u)du f Q y/uh(u)du 

it follows from (fl2l) that h G A(A). Hence, A h < A. Since A G (A^,oo) was arbitrarily chosen, it 
follows that A/j < A^. 

Now assume that A h < oo. We can show that \ h < A/, by noting that 


h(u) . , h(u) 

lim -- G (0, oo) <=> lim G (0, oo) , 

u^o h{u) h{u) 


and reversing the roles of h and h in the above argument. We have shown that A/, < oo if and only 
if X h < oo, and when they arc finite, they are equal. □ 


Corollary 3. Let h be a mixing density that satisfies condition AT If h is polynomial near the origin 
with power c > n ~ p+ , then the posterior distribution is proper and the DA Markov chain is 
geometrically ergo die. 


Proof. We can write 


POO d ~ 2 

Jo u 2 


e 2 h(u) du Jo e 2 h*(u)du 


/ 0 °° it5 e~^ h(u)du Jo°° 4 ue ™h*(u)du ’ 


(13) 


where ft*(u) is the mixing density that is proportional to u~ 2 ~ h{u). It’s easy to see that h* is 
polynomial near the origin with power c' > ra ~P+ 2a ~ 2 . (Note that (N 2) implies that d > 0, so the 
integral in the numerator on the right-hand side of (fl3T ) is finite.) Let h be the Gamma(c' + 1,1) 
density, which is clearly polynomial near the origin with power d. Then, 

h*(u) h*(u ) u c 

lim = ii m —LA-- e 0,oo . 

h(u) u c h(u ) 

Thus, © and Lemma [I] imply that A/,* = Ar = 1/(2 d + 1), and the result now follows from 
Proposition |T| since 


Ah* — 


1 


< 


1 


2c' + 1 n — p + 2a — 1 


□ 


4.3 Case III: Faster than polynomial near the origin 

Lemma 2. Suppose that h and h are two mixing densities that are both strictly positive in a neigh¬ 
borhood of zero. If there exists p > 0 such that j is a strictly increasing function on (0, 77 ], then 

A/* < A*. 
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Proof. First, fix s > 0 and define two densities as follows: h s ^{u) = K s>r) e 2 ~ h{u) I(o.ri) ( u ) and 
h S)V {u) = dT S7? e 2 " h{u) /(o i? 7 )(w), where Lf S)?? and K SiV are normalizing constants. Since & is 
strictly increasing on ( 0 , rj\, it follows that 


hipM. >1 

h SiV (u) h(u ) K s ,i 7 

for some u* E (0, //). This shows that the densities h s . v and h S)V cross exactly once in the in¬ 
terval (0, rj), which is their common support. It follows that a random variable with density h StV is 
stochastically dominated by a random variable with density /i s r? . This stochastic dominance implies 
that 

rv i _ rv i rv _ rv 

/ —=h s .,fu)du> / —=h s ^(u)du and / y/uh S}V (u) du < / s/uh SjV (u) du . 

Jo V M Jo V u do ’ do 


(14) 


Now define two more densities as follows 


, , ■, h(u) ~ h(u) 

h v( u ) = rn T7TVT J (O0W and M u ) = , , J (o,r?)W • 


/o' 


// /i(u)dv 


It follows from (fl~4l > that 


/k+ e 2 h v (u)du _ f R+ h s>v (u) du f R+ ^h SiV (u) du _ J R+ e 2 h v (u) du 


f R+ \/ue s 2 h v {u)du f R+ y/uh S)T1 {u) du f R+ y/uh s „{u) du f R+ y/ue "" h v (u) du 
Hence, A?, < Ar . Since 

o fl'rj 

Ku) ,:_*(«) 


lim , , . 


rv 

/ /i(n) dn E M+ and lim 
do hr) (u) 


rv _ 

/ M 

do 


n) dv E M+ 


it follows from LemmaQ]that A/, = A/, and Ar = A,- . 


□ 


Corollary 4. Let h be a mixing density that satisfies condition A4. If h is faster than polynomial 
near the origin, then the posterior distribution is proper and the DA Markov chain is geometrically 
ergodic. 


Proof Again, define h*(u) to be the mixing density that is proportional to u 2 h(u). In light of 
d, it suffices to show that A/,* = 0. First, note that h* is faster than polynomial near the origin. 

Fix c > 0 and define h(u) = (c + 1) u c F(o,i)(^)- Clearly, A^ = 


h*(u) 
h(u ) 


T^^qrj-. Since h* is faster than 
is strictly increasing in ( 0 , r/ c ). 


polynomial near the origin, there exists ij c E (0,1) such that 

Thus, Lemma [2] implies that A/,. < X h = 2 / ( . But c was arbitrary, so A/,* = 0. The result now 
follows immediately from Proposition Q] □ 


Taken together, Corollaries [2], [3] and [4] are equivalent to Theorem Q] Hence, our proof of Theo¬ 
rem [His complete. 
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5 Examples and a result concerning mixtures of mixing densities 


We claimed in the Introduction that every mixing density which is a member of a standard para¬ 
metric family is either polynomial near the origin, or faster than polynomial near the origin. Here 
we provide some details. When we write W ~ Gamma (a, 7), we mean that W has density pro¬ 
portional to By W rsj Bet a (a:. 7), we mean that the density is proportional to 

w a ~ 1 ( 1 — u;) 7 ^ 1 /(o i i)(m), and by W Weibull(a, 7), we mean that the density is proportional 
to w a ^ 1 e~' YU,a I^ + (w). In all three cases, we need a, 7 > 0. It is clear that these densities are 
all polynomial near the origin with c = a — 1. Moreover, condition M. always holds. Hence, 
according to Theorem |T] if the mixing density is Gamma(a,7), Beta (a, 7) or Weibull(o;, 7) with 
a > n -P + 2 £- d + 1 ; then the DA Markov chain is geometrically ergodic. 

By W ~ F(t'i, 7/9), we mean that W has density proportional to 

2)/2 

(I + (TkH (n+ " ,/2 Iu * (w) ’ 

where 77 . zz 2 > 0. These densities are polynomial near the origin with c = (u\ — 2)/2. To get a 
geometric chain in this case, we need u\ > n — p + 2a — d+1 and U 2 > d. (The second condition 
is to ensure that condition A 4 holds.) Consider the shifted Pareto family with density given by 

7 cC 

where a, 7 > 0. This density is polynomial near the origin with c = 0. Since the requirement that 
c > n ~'P+ 2 a-d- 1 f orces c to h c strictly positive. Theorem 1 is not applicable to this family. 

By W ~ IG(a:, 7), we mean that W has density proportional to w~ a ^ 1 e~ 1 ^ w I^ + (w), where 
a, 7 > 0. For any c > 0, the derivative of log (h(w)/w c ) is 


— (ct + c + 1) 
w 


+ - = - 

• 9 

W z W 


(d T- c + 1) H- 

w 


which is clearly strictly positive in a neighborhood of zero. Hence, the IG(a, 7) densities are all 
faster than polynomial near the origin. Thus, Theorem Q] implies that, as long as a > d/2, the DA 
Markov chain is geometrically ergodic. 

By W ~ GIG(n,a, b), we mean that W has a generalized inverse Gaussian distribution with 
density given by 

h{w) = © 2 ”"" 1 exp {- \ (“” + © } /R + (tt,) ■ 

where a,b € M + and v G M. Taking v = —\ leads to the standard inverse Gaussian density 
(with a nonstandard parametrization). By IT' ~ Log-normal (p, 7), we mean that W has density 
proportional to 

— exp | - — (logw;-q) 2 )}/ R+ (u;) , 
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where p, E R and 7 > 0. By W ~ Frechet(a, 7 ), we mean that W has density proportional to 

w~ ( a+ i) e~I r + (w ) , 


where a, 7 > 0. Arguments similar to those used in the inverted gamma case above show that all 
members of these three families are faster than polynomial near the origin. Moreover, condition Ai 
holds for all the Log-normal and GIG densities, and for all Frcchct(o, 7 ) densities with a > d/2. 
Thus, the corresponding DA Markov chains are all geometric. 

We end this section with a result concerning mixtures of mixing densities. 


Proposition 2. Let I be an index set equipped with a probability measure £. Consider a family of 
mixing densities {h a } 0 c t such that A/, o = 0 for every a € I. In particular, for every a € I and 
every A € (0,1), there exists k a \ > 0 such that 


fo° 77 e ** K{u)du 


f//°y/ue 2 h a (u)du 

for every s > 0. Suppose that, for every A E (0,1), 


< As + k a7 x 


sup ka, A < 00 . 

a£l 

Then A^ = 0 where h(u ) = fj h a {u) £(da). 

Proof. Fix A € (0,1). For every s > 0, we have 

Jo° 75 e ~^ h ^ u ) du _ fl ( / 0 °° 77 du )t( da ) 

/“v^e 2“ h(u)du J 0 °° y/ue 2“ h(u) du 

//(As + k a ,x)( f 0 °° y/ue~^ h a (u) du)£{da) 


< 


x/ue~^ h{u) du 


\ Si Jo°° y/u e ™ h a (u)du£{da) 

< (As + sup kq^xj - r oo , , - 

ae/ J 0 v Me 2 h(u) du 

= As + sup fc ai A ■ 

ael 


(15) 


Since this holds for all A E (0,1), the result follows. 


□ 


Remark 2. If the index set, I, in Proposition \2} is a finite set, then (1151) is automatically satisfied. 
Here’s a simple application of Proposition [2] 

Proposition 3. Let {hjf/fy be a finite set of mixing densities that all satisfy condition A4, and are 
all either zero near the origin, or faster than polynomial near the origin. Define 

M 

h(u ) = ^2'Wi hi(u) , 
i =1 
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where w t > 0 and w i = 1- Then the posterior distribution is proper and the DA Markov 

chain is geometrically ergodic. 

Proof. Since Proposition |2] implies that A/, = 0, the arguments in the proof of Corollary 0] can be 
applied to prove the result. □ 
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Appendix: Matrix Normal and Inverse Wishart Densities 


Matrix Normal Distribution Suppose Z is an r x c random matrix with density 


fz(z) = 


1 


— exp 


-Mr {A-\z-d)B~\z~0f} 


(27r) 2 |A| 2 \B \2 

where 6 is an r x c matrix, A and B are r x r and c x c positive definite matrices. Then Z is 
said to have a matrix normal distribution and we denote this by Z ~ N nc (@. A, B) (Arnold,, 


1981, Chapter 17). 


Inverse Wishart Distribution Suppose W is an r x r random positive definite matrix with density 


fw{w) = 


m+r-t-l 

2 exp 


mr r(r — 1) m -r-rr ^ , ^ \ 

2 2 vr 4 | 0 | 2 n i=i r(i(m + l — i)) 

where m > r — 1 and 0 is an r x r positive definite matrix. Then W is said to have an inverse 
Wishart distribution and this is denoted by W ~ IW r (m, 0). 


-ISr(W) , 
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